Skip to content

Add support for the atom chunk encoding in the upcoming Erlang 28#32

Open
tomas-abrahamsson wants to merge 5 commits intoklajo:mainfrom
tomas-abrahamsson:erlang-28
Open

Add support for the atom chunk encoding in the upcoming Erlang 28#32
tomas-abrahamsson wants to merge 5 commits intoklajo:mainfrom
tomas-abrahamsson:erlang-28

Conversation

@tomas-abrahamsson
Copy link
Copy Markdown
Contributor

In Erlang 27 and earlier, the atoms in the AtU8 chunk are encoded as one byte of length, followed by the UTF-8 encoding of the atom text.

In Erlang 28, this is changed, so the length is either one or two bytes. The encoding of the length is a bit involved, and it seems this is due to reusing of a scheme originally introduced when the JIT was added, but it makes it possible to specify the length of the atom even when it consists of code-points that encode to several bytes in UTF-8. In Erlang 28, the id of the chunk is still AtU8 but the number of atoms is negative instead of positive.

I made some more changes:

The github action adds Erlang 27 and drops Erlang 23

I dropped the code to handle latin-1 encoded atoms when the chunk id is Atom. The AtU8 chunk for UTF-8 encoded atoms was introduced in Erlang 20. I don't know when Erlang dropped support for the old chunk, but in Erlang 27, support was dropped to read beam files from 24 and older, if I remember right.

Instead of constructing the chunks and the top-level form with updated chunk lengths and paddings, I changed to use beam_lib:build_module which has been around since Erlang 18. So the code now only needs to concern itself with the atom chunk. (As a side note, I found beam_lib functionality only for getting a list of atoms from the atom chunk, beam_lib:chunks(Beam,[atoms]), but no function to make a chunk from a list of atoms, so the code still needs to know the encoding of the atom chunk.)

Since the code now uses beam_lib to traverse chunks and put it back together to a module, there is no longer any big reason to document the beam file format. My intention was to replace it with a link to the documentation of it, but I could not find it anywhere. Eventually, I linked to the wayback machine.

@tomas-abrahamsson
Copy link
Copy Markdown
Contributor Author

Erlang 28.0 was released yesterday, so I added another commit to the top of this PR to include it in the github workflow.

Update supported versions in README.md
In Erlang 20, it was replaced by the AtU8 chunk
to support unicode atoms.
Use the beam_lib module to process chunks, this takes care of size
calculations, padding and more for us.

Since we now use beam_lib to process chunks, drop the documentation of
the beam file format, as we no longer use this info. Replace it with
a link to the beam file format.
The length of atoms changes in Erlang 28: Atom lengths of
up to 15 fit in one octet, but longer lengths require 2 octets.
Support both old and new formats.
Update supported versions in README.md
@tomas-abrahamsson
Copy link
Copy Markdown
Contributor Author

I had forgotten to update the badge of supported versions in the README.md in the two commits that bump the github workflow OTP versions.
I fixed that and force-pushed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant